adaptive similarity query
Correlation Clustering with Adaptive Similarity Queries
In correlation clustering, we are given $n$ objects together with a binary similarity score between each pair of them. The goal is to partition the objects into clusters so to minimise the disagreements with the scores. In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries. On the one hand, we describe simple active learning algorithms, which provably achieve an almost optimal trade-off while giving cluster recovery guarantees, and we test them on different datasets. On the other hand, we prove information-theoretical bounds on the number of queries necessary to guarantee a prescribed disagreement bound. These results give a rich characterization of the trade-off between queries and clustering error.
Reviews: Correlation Clustering with Adaptive Similarity Queries
This paper studies correlation clustering in an active learning setting where the learner does not query for all the {n choose 2} pairs of vertices. In the standard setting, the algorithm KwikCluster of Charikar et al. achieves a clustering cost of 3OPT where OPT is the cost of the best clustering. Here the cost of the clustering is defined as the number of pairs labeled 1 and put in different clusters plus the number of pairs labeled -1 and put in the same cluster. The main contribution of the paper is a variant ACC of KwikCluster that achieves a cost of 3OPT O(n 3/Q) where Q is an upper bound on the number of queries made by the algorithm. The authors also prove a matching lower bound on the error of any randomized algorithm that makes Q queries.
Correlation Clustering with Adaptive Similarity Queries
In correlation clustering, we are given n objects together with a binary similarity score between each pair of them. The goal is to partition the objects into clusters so to minimise the disagreements with the scores. In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries. On the one hand, we describe simple active learning algorithms, which provably achieve an almost optimal trade-off while giving cluster recovery guarantees, and we test them on different datasets. On the other hand, we prove information-theoretical bounds on the number of queries necessary to guarantee a prescribed disagreement bound.
Correlation Clustering with Adaptive Similarity Queries
Bressan, Marco, Cesa-Bianchi, Nicolò, Paudice, Andrea, Vitale, Fabio
In correlation clustering, we are given $n$ objects together with a binary similarity score between each pair of them. The goal is to partition the objects into clusters so to minimise the disagreements with the scores. In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries. On the one hand, we describe simple active learning algorithms, which provably achieve an almost optimal trade-off while giving cluster recovery guarantees, and we test them on different datasets. On the other hand, we prove information-theoretical bounds on the number of queries necessary to guarantee a prescribed disagreement bound.